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Abstract 

We present a source inversion technique for chemical constituents that uses as- 
similated constituent observations rather than directly using the observations. The 
method is tested with a simple model problem, which is a two-dimensional Fourier- 
Galerkin transport model combined with a Kalman filter for data assimilation. Inver- 
sion is carried out using a Green’s function method and observations are simulated 
from a true state with added Gaussian noise. The forecast state uses the same spec- 
tral spectral model, but differs by an unbiased Gaussian model error, and emissions 
models with constant errors. The numerical experiments employ both simulated in 
situ and satellite observation networks. Source inversion was carried out by either 
direct use of synthetically generated observations with added noise, or by first as- 
similating the observations and using the analyses to extract observations. We have 
conducted 20 identical twin experiments for each set of source and observation con- 
figurations, and find that in the limiting cases of a very few localized observations, 
or an extremely large observation network there is little advantage to carrying out 
assimilation first. However, in intermediate observation densities, there decreases in 
source inversion error standard deviation using the Kalman filter algorithm followed 
by Green’s function inversion by 50% to 95%. 
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1. Introduction 

Understanding the terrestrial carbon cycle is of prime importance to predicting the 
evolution of climate and ecosystems. It is particularly useful to gain knowledge of the 
fluxes of carbon species between land and atmosphere and ocean and atmosphere; 
without this knowledge, an understanding of the physical and biological processes 
that govern the present-day carbon budget cannot be attained. This means in turn 
that there is little chance of accurate prediction of the future climate. There are two 
predominant approaches to deducing these fluxes, or source-sink distributions. One of 
them, the “bottom-up” method uses models of ocean biogeochemistry or land ecosys- 
tems along with data constraints (meteorological analyses and relevant biophysical 
parameters, such as leaf-area index deduced from satellite data. Examples of such 
bottom-up approaches include Tucker et al. [1986] and Randerson et al. [2004]. 
The second, “top-down” approach uses atmospheric concentration measurements in 
conjunction with transport fields (winds, cloud mass fluxes and diffusivity) deduced 
from atmospheric analyses or models. Both approaches are subject to uncertainty, 
associated with model error, analysis uncertainty, and characteristics of the various 
types of observations. Limitations in the observations include sparse sampling of in- 
homogeneous quantities and the inherent averaging involved in deducing quantities 
of physical relevance (e.g., concentrations) from measurements (e.g., radiances). 

A number of inverse modeling studies have used surface concentration measure- 
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ments from a sparse global network to deduce fluxes for a small number (about 12) 
continental- or basin-sized regions. For example, Gurney et al. [2005] examine some 
uncertainties in this method, by analyzing differences between deduced fluxes among 
inverse models that employed different wind fields. While the continental-scale flux 
estimates were in reasonable agreement in regions with a few data sources, there 
was much uncertainty in unconstrained regions, as may be expected. Petron et al. 
[2002] used synthesis inversion to estimate time dependent CO fluxes using ground 
CMDL (now called NOAA/ESRL) surface station data. A number of other studies 
[e.g., Rayner and OBrien 2001] have considered the utility of trace-gas constraints 
derived from space-based instruments, which offer a vastly enhanced data coverage 
- potentially thousands of soundings per day, compared to tens of observations from 
in-situ instruments. 

Inverse methods for estimating chemical sources and sinks generally use either dif- 
ferential (and deterministic) or integral (and Bayesian) methods. Differential meth- 
ods use a mass balance to solve for the chemical sources, and therefore require con- 
stituent observations on a regular grid. Bayesian methods involve the minimization 
of a cost function and can employ Greens functions [Tarantola, 1987; Enting, 2000, 
Petron, 2004], adjoint methods [Kaminski et al, 1999; Kopacz et al., 2008], or ensem- 
ble Kalman filter methods [Peters et al, 2005] . Green’s functions are defined as the set 
of observed constituent values that would be expected given a unit source at a single 
region (or grid point) using a chemical transport model (which includes estimated 
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sources and sinks). The actual observations are then used to invert the resulting 
system to calculate a new source/sink estimate. In global models, there are generally 
two many gridpoints to define a Green’s function for each one, so synthesis inversion 
is used in which the sources are defined in terms of larger emissions regions (or source 
pattern). The inversion then solves for the magnitude of each source region. Adjoint 
methods compute the new source estimate using the adjoint of the model {it. the 
transpose of the Jacobian) and apply it to the difference between the observed and 
modeled tracer values. 

Data assimilation and inverse modeling of atmospheric constituents are funda- 
mentally interrelated methodologies, so much so that the terms are often used in- 
terchangeably within the chemical inversion community. Both involve the use of 
transport models and observations of chemical constituent concentrations. They also 
have in common the use of Bayesian formalism, and require an estimate of model 
and observation error covariances. However, they differ in that data assimilation is 
generally concerned with obtaining the best possible estimate of the state of the atmo- 
sphere (where the state refers to the space- time distribution of the chemical species), 
while chemical inversion is concerned with estimating surface sources and sinks of the 
species. The question arises as to whether these differences in purpose results in an 
equivalent extraction of information from the observations. The answer to this will 
depend in part on exactly which assimilation and inversion techniques are used. 

The Kalman filter [Kalman, 1960] produces an optimal estimate of the state of a 
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system in the minimum error sense when certain conditions are met. These include 
assumptions of unbiased forecast and observation errors, Gaussian error statistics 
and linear dynamics. Each of these requirements is difficult to achieve in atmospheric 
data assimilation applications, but they can often be good approximations to real 
systems. For linear state estimation problems, the Kalman filter gives a minimum 
variance solution by minimizing a cost function that gives weights to the forecast and 
observations according to their relative covariances [Cohn, 1997]. The forecast error 
covariance, P / , is evolved by the linearized dynamics, and therefore contains current 
information on error variance and the correlations between different locations. In 
carrying out assimilation, non-zero correlations are used to spread the corrections 
to the forecast to gridpoints near the observations. The resulting analysis error co- 
variance, P a , then includes the current error variance and correlation lengths for the 
analysis field. This approach is only valid for linear systems, the extended Kalman 
filter (EKF) can be applied to nonlinear systems [Gelb, 1974; Jazwinski, 1970]. 

How can this error covariance information be used to improve the estimation of 
chemical sources? The Kalman filter is generally too computationally expensive for 
use in global three-dimensional data assimilation systems. There have been, however, 
some studies that use it on isentropic surfaces in the Stratosphere [Menard et al, 
2000a, b; Auger and Tangborn, 2004] . These studies showed how the error correlation 
information in P^ can impact the success of the assimilation. 


A direct comparison between inversion for source estimation and data assimilation 
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is difficult because the end product is different. One could, however, devise a way to 
make a meaningful comparison by adding an extra step to one of the schemes so that 
both constituent concentration and source/sink are estimated. For example, after 
obtaining a new source/sink estimate using a Bayesian inversion, the model could be 
re-run to obtain an improved estimate of the constituent concentration state. Alter- 
natively, the analysis concentration field obtained through data assimilation could be 
used to as an input to a source inversion scheme to obtain a new source estimate. 

Kalman filtering has previously been used as a technique for inverting for sources 
and sinks. Hartley and Prinn [1993], defined a vector of source strengths as an exten- 
sion of the state space, so that the observation operator is just the linear transport 
model, and the forecast error variance is then a measure of the uncertainty in the 
source estimate. This formulation required a perfect model (transport and chemistry) 
assumption. Gilliland and Abbitt [2001] developed an adaptive iterative Kalman fil- 
ter for source inversion in which time integrated emissions are treated as unobserved 
state variables. In this work they made use of observations that are only available 
over short time periods and showed how errors in initial concentration estimates can 
persist during the course of the assimilation. 

The value of combining data assimilation and source inversion is most obvious 
when using a differential inversion method. Assimilation spreads the observation in- 
formation to nearby grid points, creating the spatial variations needed to calculate 
spatial derivatives. Law [1999] used spline interpolation to spread the observations, 
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and Dargaville [2000] used a modified interpolation technique to invert CO 2 obser- 
vations for a variety of regional sources. Neither of these works takes advantage of 
the covariance propagation or tuning available in current constituent assimilation sys- 
tems. Furthermore, the mass-balance inversion methods are local, using only nearby 
grid points, and thus cannot gain any improvement from more distant observations. 

This work is motivated by the growth in the quantity of satellite-derived distribu- 
tions of atmospheric trace gases. Measurement of trace gases in the atmosphere has 
led to significant increases in efforts to incorporate these measurements into atmo- 
spheric transport models with the goal of obtaining improved estimates of their global 
distribution and of their sources and sinks. State estimation through the combination 
of observation and model output is generally referred to as data assimilation, while 
source sink estimation is referred to as inverse modeling. 

The present study examines a highly simplified system for top-down, or inverse, 
modeling. A simple two-dimensional advection model with an analytically specified 
wind field is used to compute atmospheric tracer concentrations from a specified 
source-sink distribution. A variety of sampling approaches are then adopted to ex- 
amine how accurately the original source-sink distribution can be retrieved in the 
presence of random errors in both observations and source model. An important 
aspect of the study is the application of data assimilation to produce analyses from 
the observations; a comparison is made between the source-sink distribution deduced 
from analyses and direct observations. It is thus a highly idealized Observation System 
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Simulation Experiment (OSSE), which is intended as a prelude to similar experiments 
using more realistic systems. In section 2 we define the two-dimensional transport 
model and in section 3 we introduce the Kalman filter for estimating constituent 
field. This is followed by the Bayesian Green’s function inversion procedure for esti- 
mating chemical sources in section 4 and the new combined assimilation and inversion 
scheme in section 5. Section 6 presents the results of the new system followed by the 
conclusions in section 7. 

2. Transport Model and Observing System 


We define the transport model as the solution to the linear two-dimensional convection- 
diffusion equations 


dc dc dc f Pc d 2 c\ 
dt + U dx + V dy 01 \dx 2 + dy 2 ) 


+ S — Lc 


( 1 ) 


where c is the mixing ratio, (u,v) are the (x,y) components of velocity, a is the 
diffusivity, S is the rate of production of c, and L is the loss rate frequency of c. We 
treat this system as non-dimensional, so all the variables are unitless. The boundary 
conditions are periodic in x and y, and the domain is of size 2n x 2n. The numerical 
model employed is a Fourier-Galerkin scheme with Crank- Nicolson time-stepping. 
The numerical solution is then written as 


c fe+ i = + S 


( 2 ) 


where k is the time-step and $ represents the numerical model’s system matrix, and 
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the ' indicates that a variable or parameter is in spectral space. 

The constituent field is related to its Fourier coefficients by the fast Fourier trans- 
form, represented by a matrix operator F, so that 

c;t = Fc fc . (3) 

We define the evolution of the true constituent state as different from the transport 
model by a random model error, which implies that 

Cfe+i = $cjj. + S + bfc, (4) 

where b*, is the Fourier coefficient vector of a zero mean, Gaussian distributed random 
vector bfc. The model error is characterized by its covariance 

Qfc = <b*bj’>. (5) 

The diagonal terms of Q*, are the model variance, (cx m ) 2 and are constants in time. 

The observations (c°) are taken from the true field, with a spatially uncorrelated 
random measurement error, f k . The observations are then 

c° k = H fc 4 + f fe . (6) 

The observation errors are characterized by the diagonal observation error covariance 
matrix 

R* = M) (7) 

which has an error variance of (cr°) 2 along its diagonal and has a characteristic cor- 


relation length scale of l c . 
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The operator H fc relates the true constituent field to the actual observation loca- 
tions. In the next two sections we relate state estimation using Kalman filtering to 
source/sink estimation using synthesis inversion. 

The experiments presented in this paper will make use of synthetic observations 
which are obtained from an artificial “nature” run that differs from the model by 
some difference in the source plus some random errors in the constituent field, bk- 
We define this nature run as the “true” state of the system. 

The source in the nature run is defined by a constant quadratic function centered 
at the point (0.47,0.47) with a peak flux/area = 40, as shown in Figure 1(a). The 
constituent field that results from running the model (starting from a uniformly zero 
field) for 1000 timesteps (unit time of 1.0) is shown in Figure 2(a). In this example 
the velocity field is u — 4, v — 2 ,the diffusivity a. — 0.02, and the loss coefficient 
L = 0.2. 

3. The Kalman Filter Algorithm 

The Kalman filter gives the minimum variance solution to the estimation of the 
state of the system from the model and observations when the errors are unbiased and 
Gaussian random vectors. It is also assumed that the error variance and correlation 
lengths for the model, observation and initial errors are accurately known. Since our 
system evolves in terms of Fourier coefficients, it is most computationally efficient to 


evolve the error covariances in the same manner. If the observations are assimilated 
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into the system every m timesteps, then the algorithm consists of the following steps: 
Update of the constituent forecast Fourier coefficients from the previous con- 
stituent analysis by m steps 


k+m ^ ^ k 


( 8 ) 


where <F m is defined as m applications of the matrix <f>. The forecast error covariance 
(in spectral space) is propagated m steps starting from the analysis error covariance 


by 


H +m = + Q* ( 9 ) 

where the covariance matrices have all been transformed to spectral space. The 
analysis error covariance is determined (in physical space) at assimilation time from 


PI = (i - k*h*)p£. 


( 10 ) 


The Kalman gain matrix K fe , which determines the relative weights given to the 
observations and forecast, is 

Kfc = p£h t (hp£h t + R) -1 . (11) 


Then the new state estimate, or analysis update is given by 

c% = 4 + K»(<4 - H*c£) (12) 

Cohn [1997] has summarized some of the important properties of the Kalman 
filter for distributed systems. These include the fact that the error covariances are 
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independent of the observation values, but are dependent on the observation locations 
and errors. This means that as observations are assimilated, information on their 
impact on the analysis field is included in the analysis error covariances. Because 
the forecast error covariance is propagated forward starting from the analysis error 
covariance, it will also contain information on past observation locations and accuracy, 
insuring that the weighting between forecast and observation takes into account past 
as well as current information. 

4. Synthesis Inversion 

The terms synthesis and Green’s function inversion are often used interchangeably, 
though synthesis inversion is in fact a technique that uses pre-defined source patterns 
so as to reduce the computational cost of the inversion. The technique is based 
on the Green’s function method for solving differential equations through the use 
of an integral operator. The Green’s functions themselves are the resulting set of 
observations that would be obtained from a unit source at a single point source (or 
linear combination of sources in the case of synthesis inversion) of unit strength. This 
is done by running the transport model forward in time from some initial state, for 
each unit source. Estimates of the sources are obtained by comparing the Green’s 
function with the actual chemical tracer observations and carrying out the inversion. 

Synthesis inversion assumes that surface sources of a particular chemical species 


will eventually be observed somewhere in the atmosphere, the algorithm requires that 
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the lifetime of the species is long compared to the transport times. If chemical reaction 
adds or removes a substantial fraction of the species during the time during the time 
of transport, the Green’s functions won’t accurately represent the distribution of the 
species that results from the surface sources. For this reason, synthesis inversion is 
generally only used for long lived species such as CO and CO 2 . 

The standard nomenclature for chemical source inversion differs from those used 
in data assimilation. In this paper, we will use the usual inversion notation [Enting 
(2002)] , but will relate them to the data assimilation notation to help improve clarity. 
Green’s functions are created by using a source of unit strength at each of the N x x N y 
grid points for each Green’s function, and running the transport model forward in 
time. Thus, each Green’s function is the solution to the transport model given a 
single unit source. The set of all Green’s functions ( N x x N y ) are then combined to 
create a Green’s function matrix, G (N x N y x N x N y ). Given an existing estimate of 
the sources (z) and a set of observations (c°), error covariance for the observations 
(X -1 = R) and error covariance for the source model (W _1 ), the Green’s function 
inversion yields the new source estimate (S new ) as 

S„,„= [G r R~ 1 G + W ] -1 [G t R - 1 c° + Wz] = [G t XG + W]-' [G r Xc° + Wz] 

( 13 ) 

where z is the a priori source estimate and c is the observational data set. The error 
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covariance of the estimate S nevj is 

[G t XG + W] _1 (14) 

5. A Combined Kalman Filter and Synthesis Inver- 
sion Algorithm 

In this new approach we combine the two schemes in a way that retains optimal 
characteristics of the Kalman filter with the formalism of Green’s function inversion. 
The Green’s function matrix is formed in the same manner, but instead of using 
observations directly in the inversion, they are assimilated using the Kalman filter, 
resulting in analyses that give a new estimate of the state of the atmosphere at each 
observation time. Then the analyses, c“, are used at every grid point in place of 
observations c°, with error covariance X -1 = P°. The new scheme for the inversion 
is then: 

Sassim = [G t (P°) -1 G + w] _1 [G t (P“)- 1 c“ + Wz] (15) 

where S a3sim is the new source estimate that uses the assimilated observations. Since 
this inversion uses the analysis c a , the inverse of the analysis error covariance replaces 
X from (14), and the new estimated error covariance is 

[G t (P°) _1 G + W] -1 . (16) 

The advantages to this approach are that the Kalman filter evolves the error co- 
variance using the linear model. This results in both forecast and analysis error 
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covariances that contain correlations that are affected by transport and diffusion. In 
particular, information from the source region is transported downstream by advec- 
tion so that forecast errors should be correlated over greater distances. The estimated 
source error covariances are discussed further in the next section. 

6. Numerical Experiments 

The experiments presented in this paper make use of synthetic observations which 
are obtained from an artificial “nature” run that differs from the model by some 
difference in the source plus some random errors in the constituent field. 

The source in the nature run is defined by a quadratic function centered at the 
point (0.47,0.47) with a peak flux = 40 ( dc/dt/area ), as shown in Figure 1(a). The 
constituent field that results from running the model (starting from a uniformly zero 
field) for 1000 timesteps (unit time of 1.0) is shown in Figure 2(a). In this example 
the velocity field is u — 4, v — 2 ,the diffusivity a = 0.02, and the loss coefficient 
L = 0.2. 

We have carried out a series of runs to compare the accuracy of the Green’s func- 
tion inversion by directly using the observation networks with the scheme outlined 
in section 5, which uses the analysis field instead of the observational input to the 
inversion scheme. We will refer to these inversions as using direct observations and as- 
similated observations respectively. Testing of the algorithms and code includes cases 


with observations at every point and with only two observations, which are shown in 
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Figure 4(a-d). In the former case, the source inversion using direct observations (a) 
and assimilated observations (b) produced identical results, which capture the true 
source to within the observational error. This implies that when the observations are 
essentially the entire state, then the assimilation adds nothing to the accuracy of the 
inversion. In the latter case, the two schemes (c-d) were nearly equally unable to im- 
prove on the first guess of the source. This test shows that little or no improvements 
to the inversion can be made when the observations are too sparse (and the system 
is not observable). 

Our interest is in cases that lie between these two extremes, so we have carried 
out ensemble experiments with a variety of source model and observing networks, 
including global (satellite) and ground based observations (in situ). The observation 
networks are shown in Figures 5(a,b), and all of the observations are available at 
every assimilation time. 

The model uses two possible a priori source estimates, which are shown in Figure 
1. Both of these source estimates are unbiased in the sense that the total flux is 
exactly the same as that in Figure 1(a), but have either an error in location (Figure 
1(b)) or in the localization or spread (Figure 1(c)). We refer to these source errors as 
source location error and source spread error respectively. These two models also do 
not account for the random source/sink term in Equation (4). 

For each source model and each observing network, we have carried out 20 twin 


experiments using perturbed initial conditions. Twin experiments are essentially 
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simulations that are identical in every aspect except for randomly perturbed initial 
conditions. This allows us to obtain meaningful statistics of the assimilation and 
inversion results. In each case the model is run for 1000 timesteps, which is roughly 
the time required for constituents to be transported about 2/3 of the way across 
the domain. The results are presented by comparing the known true source and 
constituent field with the model output field and assimilated (analysis) fields as well 
as the resulting chemical source inversion for each case. We compare the source 
inversion using the observations directly, and by first assimilating every 20 timesteps 
using the Kalman filter as described in the previous section. In all of the experiments, 
the parameters used are velocities u — 4, v = 2, diffusivity a — 0.02 and the loss 
rate coefficient is L — 0.2 The observation error standard deviation is a° — 0.0014, 
the model error standard deviation in Equation (4) is a m — 0.01 and the model 
correlation length scale is Z c = 0.1. 

We present detailed results only for the model with source spread error and then 
summarize all the cases at the end of this section. Labels used in the text for each 
experiment are defined in Table 1. 

6.1 Concentration field 

Figure 2(b) shows the concentration field that results from running the model with 
source spread error for 1000 timesteps without assimilation. As one would expect, the 
impact of the source is wider than in the true state (Figure 2(a)), and lacks the small 
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scale structure that comes from the random source/sink term in Equation (4). We plot 
the RMS error for the concentration field as a function of time for this case, and also 
for the assimilation cases using the in situ (SIA) and satellite observations (SSA) in 
Figure 6. This figures gives an indication of the relative amounts of information in the 
observing networks, which will be important in the success of the source inversions. 
With the model alone, the observations have no impact on the constituent field, 
and the resulting RMS grows continuously as a result of both the local systematic 
source model error and the random model error. The errors are consistently smaller 
for the satellite observation network, which has more observations, but fewer in the 
vicinity of the source. The concentration field obtained from assimilation of satellite 
observations into the source spread model (SSA) is shown in Figure 2(c). The field 
has narrowed and even contains some of the small scale features present in the true 
field. Thus the assimilation, while not making any correction to the source, changes 
the downstream structure of the field to more closely resemble to the true field. The 
difference between the assimilation and true final states (c° — c*), shown in Figure 3, 
indicates that the analysis field still retains errors on the order of 20%. 

6.2 Source Inversion 

For each experiment, a source inversion is carried out using the Green’s function 
algorithm, with and without assimilation. The ensemble of twin experiments is used 


to determine the mean and standard deviation errors relative to the true source. The 
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predicted source inversion error covariances, Equations (14) and (16), are valid when 
the errors are Gaussian and unbiased. We expect that if the model and observation 
errors are unbiased, then the source inversion should also be unbiased. Figure 7 shows 
the predicted error variances for the source inversions (Equations (14) and (16) with 
and without assimilation (SSN and SSA) in a one dimensional slice through the source 
region. The predicted errors for the inversion with assimilation are as much as an 
order of magnitude smaller than the direct inversion errors. 

The ensemble mean of the inverted source is defined as 

t^inv — (Sinn)) (17) 

where the ensemble is the 20 twin experiments run for each set of parameters. We 
define the mean inversion error as 

lAnv = A* ~ S*. (18) 

While all of the errors are globally unbiased, the steady source term has a local 
bias in the sense that over a long period of time the source at one location can be 
consistently too large or too small. For example, in the model with source spread 
error, the flux is consistently too low at the center of the source and is too large near 
the edge of the source. The total from these sources is the same as the true source 
total, and the random or short term source/sink term also has zero mean. The mean 
inversion error, n\ nv , is therefore an indication of the local and global bias, to the 
extent that they differ from the true source. In all cases, the ensemble mean inverted 
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sources are zero far from the true source, so we only plot in the vicinity of the source 
(0 < x < 1; 0 < y < 1). 

Figure 8 shows the mean inversion errors that result from using the source model 
with spread error. In Figure 8(a), the inversion without assimilation and in situ 
observations (SIN) is seen to have a mean source that is locally overestimated by 
as much as 50% (x — 0.45, y — 0.3) and underestimated by up to 50% ( x — 0.6, 
y — 0.45). When satellite observations are directed used in the inversion (SSN), 
Figure 8(c), the maximum mean error is also about 50%. 

The inversion using assimilated in situ observations (SI A), 8(b), has a particularly 
large bias at the center of the source (about 70%), while the assimilated satellite 
observations (SSA) 8(d) is significantly closer to the true source (20% maximum 
mean error). However, the inversion without assimilation using in situ observations 
(SSN) 8(a), results in two spurious constituent sinks near the source. This can be 
seen from the negative mean errors around y — 0.5. 

The mean inversion errors described above only tell us whether there is any system- 
atic difference between the inverted source and true source. The random component 
of the error is represented by the error standard deviation 

Oinv = {(es - Ah) 2 ) 1/2 - {(Sinvert ~ Stnu.) 2 ) 1/2 (19) 

where e s — Si nver t — Strue and // — (Si nver t — S* rue ). We calculate the error standard 
deviation at each grid point, and plot the results in Figure 9 using the same source 
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model and observations as in Figure 8. The error standard deviations for inversions 
without assimilation are consistently larger than those with assimilation, and in some 
cases the difference can be an order of magnitude. This is important because source 
inversion is not generally done using ensembles, so that the error standard deviation 
can be a significant contribution to the inversion error. The difficulty of carrying out 
ensembles of source inversions when using global models is due to the high compu- 
tational cost, particularly when many source regions are defined. These results show 
that the random component is significantly larger than the systematic component 
for the inversions using the observations directly (Figure 9a, c). This implies that a 
single inversion that uses direct observations will have significant uncertainty in the 
resulting source estimates. Inversion using assimilated observations (Figure 9b, d) are 
much smaller than the direct inversion cases. 

We summarize the results of the ensembles of assimilation and inversion calcu- 
lations in Table 1, which lists the value of the peak mean flux, the maximum mean 
error {n\ nv ), error standard deviation a inv , and the error in the location of the peak 
mean flux. Overall, the results show that the satellite observations result in sub- 
stantially better inversion accuracy than the in situ observations (Figures 8, 9 and 
Table 1). This is most likely the result of the fact that both assimilation and inver- 
sion can make use of the greater number of more distant observations to produce a 
more accurate source estimate. Comparisons between inversion using observations 
directly and those using the assimilated observations are less straightforward. Direct 
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inversion estimates the mean peak flux more accurately when in situ observations are 
used while inversion of the assimilated observations is more accurate when satellite 
observations are used (Table 1 - mean peak flux). If we consider the maximum mean 
error (which is not generally at the same location as the peak), the inversion with 
assimilated observations is more accurate in 3 of the 4 cases (Table 1 - maximum 
mean error). 

When the model with location error is used, the direct assimilation of observations 
accurately predicts the peak flux location using either in situ or satellite observations. 
The inversion using assimilated observations is successful in this regard only when 
satellite observations are used. Finally, the variability in the solution is far smaller 
when the the observations are first assimilated, as indicated by the large error stan- 
dard deviations in the direct inversions (Table 1 - error standard deviation). In addi- 
tion, the direct inversion created substantial spurious sources and sinks, particularly 
when using the source model with location error (Figure 8a, c). 

7. Conclusions 

We have considered the question as to whether assimilating chemical tracer ob- 
servations into a transport model before carrying out the inversion contributes to the 
accuracy of the source estimation. The results presented here show that assimilating 
the observations using a Kalman filter first reduces the random error by factors be- 
tween 2 and 15 for the cases studied. Improvements to the systematic component of 
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error were less consistent, with decreases to the maximum mean error in most eases, 
but a less accurate prediction in the mean peak flux. The direct inversion of observa- 
tions results in spurious sources and/or sinks, while the case with assimilation does 
not. In each case the model or first guess source is globally unbiased, but has a local 
bias. 

Because Bayesian source inversion is a statistical weighting of model and obser- 
vations, the inversion process can never completely overcome any systematic errors. 
Thus the actual inversion errors are much larger than the predicted errors (Figure 
7). Additionally, when directly inverting from the observations, the response of the 
inversion algorithm to these biases is generally to generate spurious sinks in part of 
the domain while overestimating the source in other parts. When the observations 
are assimilated first, this tendency is greatly reduced. It is possible that the sys- 
tematic error in the assimilation could be eliminated using a bias correction scheme 
[Lamarque et al. 2004], 

Most striking is the reduction in the error standard deviation that results from the 
assimilation. This means that the accuracy of a single source inversion (as opposed 
to the ensemble used here) is greatly enhanced by assimilating the observations. The 
primary reason for this improvement is the more accurate estimate of the error covari- 
ance provided by the Kalman filter and the spreading (or smoothing) of observational 
information. While it is difficult to compare the performance of this simplified system 
with other inversion systems, Kaminski et al. [2001] showed that errors that result 
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from aggregating source regions in synthesis inversion can can be on the order of the 
emissions themselves. We can therefore state that the reductions found in the present 
paper are significant in comparison. 

The Kalman filter remains a diagnostic tool and is yet too computationally ex- 
pensive for operational data assimilation systems, yet many of the advantages can be 
translated to other algorithms. Most notably, the Ensemble Kalman filter (EnsKF) is 
being implemented in large scale atmospheric systems, including trace gas assimilation 
systems (Arellano et al., 2007). There are also a number of suboptimal Kalman filter 
algorithms that show some promise for reducing the computational load in evolving 
error covariances. Finally, even assimilation systems that don’t evolve error covari- 
ances generally rely on covariance tuning to improve the forecast error estimates. This 
also acts to improve the inversion computation through improved error statistics. 
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Figure Captions 


Figure 1 - (a) True source flux, P, which is a quadratic function centered at the 
position (0.47,0.47), with a maximum flux/area of 40 at its center and A priori source 
estimates with (b)) location error and (c) spread error. The source with location error 
is centered about 0.3 units from the true source center and the source with spread 
error has the correct center but twice the diameter as the true source. Note that in 
each of these plots, only part of the entire domain of 2tt x 2ir is shown. 

Figure 2 - Concentration field at t — 1.0 (after 1000 timesteps) that results from (a) 
Model with true source , C. Here the (b) Model with source spread error, (c) Assim- 
ilation run using the model with spread error, with the satellite observing network. 
The random part of the field in (a) is due to the time varying part of the source term 
P, while it is due to the assimilated observations in (c). The center of the source 
region is represented by a black dot in each panel. 

Figure 3 - Contour plot of analysis field minus true field (c° — c f ) at the end of 1000 
timesteps. The contour levels are 0.3. The largest errors are around 0.7, and occur 
near the source term ( x — 0.47, y — 0.47). 

Figure 4 - Mean source errors from the inversion for the extreme cases of two obser- 
vations (a),(b) and observations at every grid point (c),(d). Panels (a) and (c) are 
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for direct inversion of the observations are (b) and (d) are for inversion of assimilated 
observations. 


Figure 5 - In Situ (a) and Satellite (b) observation locations. Observation locations 
are the same at each analysis time. 

Figure 6 - RMS error in the concentration field relative to the nature run for the model 
with spread error. The curves shown are for model only (solid line), assimilation of 
in situ observations (dashed line) and assimilation of satellite observations (dash-dot 
lines). 

Figure 7 - Predicted error variance for the source inversion for the cases SSA (solid 
line) and SSN (dash-dot) along a slice of the source region. 

Figure 8 - Ensemble mean errors in source estimates from synthesis inversion using: 
(a) in situ observations (class SIN), (b) assimilated in situ observations (SIA), (c) 
satellite observations (SSN), and (d) assimilated satellite observations (SSA). All 
cases use the model with spread error 

Figure 9 - Error standard deviation of source estimates from synthesis inversion using: 
(a) in situ observations, (b) assimilated in situ observations, (c) satellite observations 
and (d) assimilated satellite observations. 
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Table 1 


Model with spread error 


in situ obs 


label 

mean peak flux 

max. mean error 

error stand, dev. 

dist. from true peak 

No assim. 

SIN 

45 

25 

150 

0 

With assim. 

SIA 

12 

30 

10 

0 

Satellite obs 


label 

mean peak flux 

max. mean error 

error stand, dev 

dist. from true peak 

No assim. 

SSN 

50 

12 

50 

0 

With assim. 

SSA 

32 

9 

24 

0 


Model with location error 


in situ obs 


label 

mean peak flux 

max. mean error 

error stand, dev 

dist. from true peak 

No assim. 

LIN 

70 

50 

170 

0.05 

With assim. 

LIA 

38 

35 

9 

0.33 

Satellite obs 


label 

mean peak flux 

max. mean error 

error stand, dev 

dist. from true peak 

No assim. 

LSN 

38 

20 

55 

0 

With assim. 

LSA 

35 

10 

22 

0 


Table 1 - Summary of ensemble results for the assimilation and inversion for the 
different observation and model types, including the mean peak flux, maximum mean 
error, peak error standard deviation and distance of the peak flux from the true 
location. The true peak flux is 40 (at x — 0.47, y = 0.47) and the model with 
location error (at x — .7, y — 0.4) is a distance of 0.33 from the true location. The 
labels identify which model and observation type are used in each set of experiments. 
All values are non-dimensional and the errors presented are absolute. 
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(a) 



(b) 



(c) 


Figure 1: (a) True source flux, P, which is a quadratic function centered at the 
position (0.47,0.47), with a maximum flux/area of 40 at its center and A priori source 
estimates with (b)) location error and (c) spread error. The source with location error 
is centered about 0.3 units from the true source center and the source with spread 
error has the correct center but twice the diameter as the true source. Note that in 
each of these plots, only part of the entire domain of 2ir x 2n is shown. 
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Figure 2: Concentration field at t — 1.0 (after 1000 timesteps) that results from (a) 
Model with true source , C. Here the (b) Model with source spread error, (c) Assim- 
ilation run using the model with spread error, with the satellite observing network. 
The random part of the field in (a) is due to the time varying part of the source term 
P , while it is due to the assimilated observations in (c). The center of the source 
region is represented by a black dot in each panel. 
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Figure 3: Contour plot of analysis field minus true field (c° — c 1 ') at the end of 1000 
timesteps. The contour levels are 0.3. The largest errors are around 0.7, and occur 
near the source term ( x — 0.47, y — 0.47). 
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Figure 4: Mean source errors from the inversion for the extreme cases of two obser- 
vations (a),(b) and observations at every grid point (c),(d). Panels (a) and (c) are 
for direct inversion of the observations are (b) and (d) are for inversion of assimilated 
observations. 
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Figure 6: RMS error in the concentration field relative to the nature run for the model 
with spread error. The curves shown are for model only (solid line), assimilation of 
in situ observations (dashed line) and assimilation of satellite observations (dash-dot 
lines). 
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Figure 8: Ensemble mean errors in source estimates from synthesis inversion using: 
(a) in situ observations (class SIN), (b) assimilated in situ observations (SIA), (c) 
satellite observations (SSN), and (d) assimilated satellite observations (SSA). All 
cases use the model with spread error 












Chemical Source Inversion Using Assimilated Constituent 
Observations in an Idealized Two-dimensional System 
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Popular Summary 

This paper investigates to possibility of using data assimilation as a way to improve the esti- 
mation of sources and sinks of trace gases. Traditionally this is done by source inversion , which 
involves combining observations of chemical species with a chemical transport model (CTM). 
The inversion is done by minimizing the differences between the observed species concentrations 
and what the CTM predicts and allowing changes in the source fluxes. Source inversion differs 
from data assimimilation in that it does not generally produce a new estimate of the trace gas 
field. The value of this state estimation is that it can be used as an initial condition for later 
stages of the model run. This means that information from earlier observations can have an 
impact on later stages of the assimilation. 

We therefore have proposed using the assimilated observations, or analyses, as the observa- 
tional input into the source inversion scheme. This will make use of the best estimate of the 
state of the constituent field, with the goal of making a new estimate of the chemical sources that 
uses information from a long history of observations. Our numerical experiments use a Kalman 
filter, which also produces error estimates of the trace gas forecasts and analyses. These can also 
be used in the inversion, which requires accurate knowledge of both observational and model 
error statistics. The methodology is tested on a simplified two-dimensional chemical transport 
system, using simulations of local (in situ) and global (satellite) observations. We find that first 
assimilating the observations before carrying out the inversion results in a lower random error 
component, a reduction in the creation of spurious sources and sinks. 



